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Abstract-  The  brain’s  representation  of  visual  information 
depends  greatly  on  the  behavioral  relevanee  of  the  viewed 
stimuli.  While  in  some  instances  behavioral  significance  is 
derived  from  conspicuity,  in  many  situations  significance 
depends  on  top-down  factors  such  as  the  viewer’s  goals  and 
knowledge.  Studies  combining  neural  recordings  and 
behavioral  observations  have  begun  to  elucidate  how  the  brain 
selects  visual  stimuli  based  on  top-down  information.  While 
many  visual  areas  of  the  brain  that  are  selective  for  visual 
attributes  participate  in  the  selection  process,  the  outcome  of 
the  selection  process  across  these  areas  appears  to  be 
represented  in  structures  like  the  frontal  eye  field,  a  key  stage 
in  the  transformation  of  visual  selection  into  a  command  to 
move  the  eyes.  Evidence  shows  that  the  frontal  eye  field 
exhibits  all  the  characteristics  of  a  visual  salience  map  in  which 
the  behavioral  significance  of  stimuli  derived  from  bottom-up 
and  top-down  influences  is  represented.  The  patterns  of 
neural  modulation  in  structures  like  the  frontal  eye  field  can  be 
used  to  design  more  efficient  machine-vision  algorithms  for 
target  selection. 
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Introduction 

Despite  our  subjective  feeling  that  we  “see”  everything 
within  our  field  of  vision,  not  every  part  of  a  visual  scene  is 
processed  to  the  same  degree  [1],  [2].  Instead,  we  attend  to 
objects  of  interest  while  ignoring  irrelevant  ones.  Influences 
on  visual  selection  can  be  broadly  divided  into  bottom-up 
and  top-down  categories.  While  bottom-up  influences  are 
derived  from  the  intrinsic  conspicuity  of  features  in  a  visual 
scene  (e.g.,  a  bright  stimulus),  top-down  influences  are 
derived  from  the  goals  and  knowledge  of  the  viewer.  A 
large  body  of  literature  shows  that  eye  movements  reflect 
cognitive  processes  during  various  viewing  contexts,  which 
include  visual  search  [3],  natural  scene  perception  [4],  and 
reading  [5]. 

Neural  mechanisms  underlying  visual  selection  have 
been  investigated  extensively.  Correlates  of  visual  selection 
appear  to  be  reflected  in  nearly  all  visual  or  visual- 
association  areas  (for  reviews,  see  [6],  [7]),  and  perhaps 
even  as  early  as  VI  (e.g.,  [8]).  Except  for  those  at  the 
earliest  stages  of  the  visual  system,  most  of  these  areas 
project  to  the  frontal  eye  field  (FEF)  [9],  which  appears  to 
be  one  of  the  highest  points  of  convergence  of  the  dorsal 
and  ventral  visual  information  streams  in  the  brain  [10].  In 
turn,  FEF  projects  strongly  to  brainstem  oculomotor 
structures  [11],  and  thus  is  ideally  positioned  to  transform 
the  outcome  of  visual  processing  into  a  command  to  move 
the  eyes.  Neural  correlates  of  bottom-up  selection  in  FEF 


have  been  reviewed  [12].  This  paper  focuses  on  studies 
demonstrating  selection  in  FEF  based  on  top-down  factors, 
completing  the  argument  that  this  area  contains  a  visual 
salience  map  in  which  behavioral  significance,  regardless  of 
its  source,  is  represented. 

Top-down  influences  during  feature  search 
A.  Expectancy 

Cognitive  strategies  can,  in  some  instances,  override  the 
effects  of  conspicuity  (e.g.,  [13]).  For  example,  experts  are 
more  likely  than  novices  to  ignore  conspicuous,  but  non- 
informative  features  of  a  visual  image  from  their  area  of 
expertise  (e.g.,  [14]).  Experience  and  training  also  influence 
the  search  strategy  of  monkeys,  as  well  as  the  concomitant 
neural  selection  process  in  FEF  [15].  Monkeys  trained  to 
make  a  saccade  to  the  oddball  stimulus  in  complementary 
color  search  arrays  adopt  a  strategy  of  shifting  their  gaze 
according  to  visual  conspicuity.  In  contrast,  monkeys 
trained  exclusively  with  one  of  the  two  complementary 
search  arrays  adopt  a  strategy  of  ignoring  stimuli  with  the 
distractor  feature,  even  when  a  stimulus  with  that  feature 
becomes  the  oddball  in  the  visual  search  array  (Fig.  lA). 
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Fig.  1.  Effect  of  feature  expectation  on  gaze  behavior  and  neural 
modulation  in  FEF  during  a  color  popout  search.  A.  Eye  movements  of 
one  monkey  during  search  with  the  learned  search  array  (left)  and  during 
viewing  of  the  complementary  search  array  (right).  Each  dot  represents  an 
eye  movement  sample.  B.  Response  of  one  FEF  visuomovement  neuron 
while  this  monkey  performed  search  with  the  learned  search  array.  Spike 
density  functions  during  correctly  performed  trials  when  the  target  was  in 
the  neurons’s  response  field  (solid  line)  and  when  distractors  were  in  its 
response  field  (dashed  line)  are  shown  superimposed.  (Modified  from  [15]) 
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In  monkeys  using  the  strategy  of  searching  for  the 
learned  target  color,  about  half  of  FEE  visuomovement 
neurons  discriminated  the  target  from  distractors  as  soon  as 
they  responded  (Fig.  IB).  In  other  words,  FEF  neurons 
exhibited  an  apparent  feature  selectivity  in  their  initial 
response  unlike  what  had  been  observed  before  in  this  area 
[16].  Furthermore,  this  selectivity  manifested  itself  as  a 
suppression  of  the  response  to  the  learned  distractor,  rather 
than  an  enhancement  of  the  response  to  the  learned  target. 
This  finding  is  consistent  with  the  suppression  of  distractor 
information  during  color  popout  visual  search  determined 
psychophysically  with  human  observers  [17]. 

In  the  study  mentioned  above,  the  search  process  was 
affected  by  expectation  of  a  non-spatial  stimulus  property 
(i.e.,  color).  Recently,  recordings  in  superior  colliculus  have 
shown  that  the  initial  activity  of  buildup  neurons  in  this 
structure  is  modulated  by  the  expectation  of  target  location 
[18].  Similarly,  reward  (or  gain)  expectancy  has  been 
shown  to  influence  the  initial  response  of  neurons  in  the 
lateral  intraparietal  area  (LIP)  [19].  Both  the  superior 
colliculus  and  area  LIP  have  also  been  surmised  to  contain  a 
visual  salience  map  [20],  [21]. 

B.  Singleton  distractors 

Experiments  with  human  observers  have  shown  that 
when  searching  for  an  oddball  stimulus  in  one  feature 
dimension,  it  is  impossible  to  ignore  an  oddball  stimulus  in 
another,  irrelevant  feature  dimension  [22].  Similarly,  during 
a  shape  feature  search,  monkeys  are  adversely  affected  by 
the  presence  of  a  distractor  that  differs  from  all  other  stimuli 
in  color  [23]  (Fig.  2A).  The  presence  of  the  color  singleton 
distractor  results  in  longer  saccade  latencies  to  the  target,  as 
well  as  an  increase  in  error  rates  (Fig.  2B).  Furthermore, 
when  monkeys  make  an  error  during  search  with  a  singleton 
distractor,  most  of  the  errors  are  accounted  for  by  saccades 
to  the  singleton  distractor. 

Neural  recordings  in  FEF  during  shape  feature  search 
with  a  singleton  distractor  show  that  selection  based  on  both 
conspicuity  and  top-down  guidance  are  reflected  in  this  area 
[23]  (Fig.  2C).  Similar  to  previous  observations  with 
monkeys  that  are  not  biased  towards  searching  for  a 
particular  shape  or  color,  the  sampled  population  of  FEF 
visuomovement  neurons  did  not  initially  discriminate  the 
target  from  distractors.  When  the  activity  of  these  neurons 
became  selective,  the  target  (i.e.,  oddball  shape)  elicited  the 
highest  activation,  despite  not  being  the  most  conspicuous 
stimulus  in  the  display.  Furthermore,  the  color  singleton 
distractor  elicited  a  higher  activation  than  the  other,  non¬ 
salient  distractors,  presumably  due  to  the  strong  attentional 
capture  by  highly  conspicuous  stimuli.  Thus,  FEF  in  this 
task  reflects  both  the  top-down  goal  of  the  search  (i.e.,  the 
stimulus  with  the  oddball  shape),  as  well  as  the  bottom-up 
attentional  capture  by  the  irrelevant  singleton. 

While  items  that  differ  in  one  or  more  features  from 
neighboring  items  draw  attention,  items  that  appear  as 
sudden-onsets  capture  attention  even  more  strongly  [24]. 


A  recent  study  in  area  LIP  has  described  the  neural 
correlates  of  the  attentional  capture  by  sudden-onsets  [25]. 
In  this  study,  stimuli  were  either  presented  in  the  receptive 
field  of  area  LIP  neurons  as  sudden-onsets,  or  were  brought 
into  their  receptive  field  by  a  saccade.  While  neurons  in  this 
area  responded  vigorously  to  the  sudden  appearance  of 
stimuli  in  their  receptive  field,  the  same  neurons  exhibited 
weak  or  no  responses  to  stable  stimuli  brought  into  their 
receptive  field  by  a  saccade.  Thus,  the  strong  initial 
responses  of  area  LIP  neurons  to  stimuli  that  are  presented 
in  their  receptive  field  appear  to  reflect  the  strong  attentional 
capture  by  sudden-onsets,  rather  than  a  passive  sensory 
response  to  stimuli  in  their  receptive  field.  Further  research 
is  needed  to  determine  whether  the  strong,  seemingly  visual 
responses  commonly  observed  in  FEF  are  also  due  to 
attentional  capture  by  stimuli  appearing  suddenly.  A  study 
in  which  visual  responses  were  recorded  in  FEF  while 
monkeys  freely  scanned  natural  images  supports  this 
possibility  [26].  However,  even  if  this  were  the  case,  the 
results  of  recordings  in  the  FEF  of  monkeys  exhibiting  a 
bias  towards  a  particular  feature  [15]  (Fig.  1)  suggest  that 
the  magnitude  of  the  attentional  capture  by  sudden-onsets 
can  be  modulated  by  top-down  influences. 
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Fig.  2.  Effect  of  a  color  singleton  distractor  on  performance  and  neural 
modulation  in  FEF  during  a  shape  feature  search.  A.  Task  display.  The 
arrow  illustrates  the  saccade  to  the  target  (i.e.,  oddball  shape).  B.  Mean 
saccade  latency  and  error  rate  during  the  shape  search  task  without  the 
color  singleton  (gray  bars)  and  with  the  color  singleton  (black  bars).  The 
proportion  of  saccades  to  the  color  singleton  in  the  “with  singleton”  search 
condition  are  represented  by  the  white  stripes.  C.  Pooled  normalized 
response  of  a  population  of  FEF  visuomovement  neurons  during  correctly 
performed  feature  search  trials  with  a  singleton  distractor.  Responses  to 
the  target  (thick  solid  line),  to  the  singleton  distractor  (dashed  line),  and  to 
the  non-salient  distractors  (thin  solid  line)  are  shown  superimposed. 
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Conjunction  search 

In  many  real-world  situations,  an  object  of  interest 
cannot  be  located  based  on  conspicuity,  and  a  memory  of 
the  object  is  required  to  locate  it  (e.g.,  “searching  for  a  face 
in  the  crowd”).  An  analogous  situation  is  obtained  during 
conjunction  search  where  the  target  is  defined  by  one 
combination  of  possible  features  (e.g.,  a  red  cross),  and 
distractors  are  formed  by  other  possible  combinations  (e.g., 
a  green  eross,  a  red  eirele,  or  a  green  eirele)  (Fig.  3A). 
Early  experiments  reported  a  powerful  diehotomy  between 
feature  and  eonjunetion  seareh  [27],  and  played  an 
important  role  in  the  development  of  theories  of  visual 
attention.  Based  mainly  on  measures  of  reaetion  time, 
feature  seareh  appeared  to  be  effortless,  while  eonjunetion 
seareh  appeared  to  be  effortful  and  attentionally-demanding 
as  evideneed  by  an  inerease  in  the  time  to  deteet  the 
presenee  of  the  target  as  the  number  of  distraetors  in  the 
display  was  inereased.  However,  later  experiments  showed 
that  eonjunetion  seareh  ean  be  performed  effieiently  [28], 
[29],  and  suggested  that  the  seareh  display  ean  be  proeessed 
in  parallel  to  identify  stimuli  with  the  desired  features. 

To  the  extent  that  attention  and  eye  movements  are 
funetionally  related,  sueh  a  parallel  seareh  strategy  prediets 
that  subjeets  would  be  more  likely  to  shift  gaze  to  a 
distraetor  that  shares  a  target  feature  (i.e.,  “similar 
distraetor”)  than  to  a  distraetor  that  has  no  features  in 
eommon  with  the  target  (i.e.,  “opposite  distraetor”).  This 
predietion  has  been  eonfirmed  with  both  humans  [30]  and 
monkeys  [31]  searehing  for  a  target  defined  by  the 
eombination  of  eolor  and  shape  (Fig.  3B).  Interestingly, 
when  target  properties  remained  the  same  within  an 
experimental  session,  but  ehanged  aeross  experimental 
sessions,  there  was  evidenee  that  the  history  of  target 
properties  affeeted  behavior  [31].  In  addition  to  being 
infiueneed  by  visual  similarity  to  the  target,  errant  saeeades 
tended  to  land  on  the  distraetor  that  was  the  seareh  target 
during  the  previous  session  (Fig.  3B).  This  tendeney 
manifested  itself  aeross  sessions  at  least  a  day  apart  and 
persisted  throughout  a  session.  Although  of  mueh  longer 
timeeourse,  this  phenomenon  may  be  related  to  the 
pereeptual  priming  observed  during  popout  seareh  with  both 
human  [32]  and  monkey  [31]  subjeets,  and  was  thus  referred 
to  as  long-term  priming. 

Reeordings  in  FEF  during  eonjunetion  seareh  further 
support  the  hypothesis  that  this  area  represents  a  visual 
salienee  map  [33].  After  an  initially  non-seleetive  response, 
FEF  neurons  not  only  diseriminated  the  target  from 
distraetors,  but  also  diseriminated  among  the  distraetors 
based  on  their  visual  similarity  to  the  target  and  the  history 
of  target  properties  aeross  sessions  (Fig.  3C).  In  other 
words,  while  the  highest  aetivation  was  assoeiated  with  the 
target,  distraetors  similar  to  the  target  elieited  stronger 
responses  than  the  distraetor  that  shared  no  target  features. 
Furthermore,  a  similar  distraetor  primed  by  virtue  of  being 
the  target  of  the  previous  session  elieited  a  stronger  response 
than  an  unprimed  similar  distraetor.  In  reeording  sessions 


during  whieh  the  opposite  distraetor  was  primed  (not 
shown),  there  was  a  relative  inerease  in  its  neural 
representation,  although  it  was  still  weaker  than  that  of 
distraetors  similar  to  the  target.  This  observation  is 
eonsistent  with  the  faet  that,  in  those  sessions,  there  was  a 
relative  inerease  in  erroneous  saeeades  to  the  opposite 
distraetor,  although  these  saeeades  were  not  as  frequent  as 
those  to  distraetors  similar  to  the  target.  Thus,  this  study 
shows  that  neural  modulation  in  FEF  refieets  a  variety  of 
top-down  infiuenees,  and  prediets  gaze  patterns  during  a 
eomplex  visual  seareh. 

Neural  seleetion  has  also  been  investigated  in  area  LIP 
during  a  eomplex  visual  seareh  in  whieh  no  stimulus  was 
eonspieuous  [25].  In  this  study  mentioned  earlier,  neurons 
did  not  automatieally  respond  to  the  entry  of  stable  stimuli 
into  their  reeeptive  field  after  a  saeeade.  However,  when 
the  stimulus  in  the  reeeptive  field  was  the  eued  seareh 
target,  neurons  responded  to  signal  its  presenee.  This  study 
suggests  that  the  representation  of  visual  information  in  area 
LIP  is  normally  sparse,  with  only  behaviorally  relevant 
stimuli  being  strongly  represented. 
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Fig.  3.  Gaze  behavior  and  neural  modulation  in  FEF  during  conjunction 
search.  A.  Conjunction  search  display.  The  arrow  illustrates  the  saccade  to 
the  target.  B.  Incidence  of  saeeades  to  distractors  representing  the 
probability  of  shifting  gaze  to  each  of  the  distraetor  types  during  error 
trials.  A  similar  distraetor  refers  to  a  distraetor  that  shares  a  target  feature 
(i.e.,  same  color  or  same  shape);  a  primed  distraetor  refers  to  a  distraetor 
that  was  the  search  target  during  the  previous  session.  C.  Pooled 
normalized  response  of  a  population  of  FEF  visuomovement  neurons 
during  correctly  performed  conjunction  search  trials  (i.e.,  initial  saccade 
was  made  to  the  target).  The  response  to  the  target  (thick  solid  line),  to  the 
primed  similar  distraetor  (thick  dashed  line),  to  the  unprimed  similar 
distraetor  (thin  solid  line),  and  to  the  opposite  distraetor  (thin  dotted  line) 
are  shown  superimposed. 
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Conclusion 

Since  its  proposal  by  Koch  and  Ullman  [34]  nearly  two 
decades  ago,  the  concept  of  a  visual  salience  map  has 
figured  prominently  in  most  models  of  covert  attention  and 
overt  saccade  production.  Evidence  strongly  supports  the 
view  that  FEE  contains  such  a  salience  map.  Neurons  in 
this  area  are  not  selective  for  visual  attributes,  but  represent 
the  behavioral  relevance  of  stimuli  derived  from  bottom-up 
influences,  as  well  as  top-down  influences  involving  goals 
and  knowledge.  Neural  modulation  in  FEF  accurately 
predicts  the  gaze  patterns  observed  in  a  variety  of  search 
tasks  that  involve  different  degrees  of  bottom-up  and  top- 
down  influences.  Furthermore,  a  recent  study  shows  that 
small  populations  of  FEF  neurons  can  account  for  reaction 
time  and  error  rate  across  a  wide  range  of  search  efficiency 
[35].  Thus,  FEF  represents  an  ideal  area  in  which  to  test 
theories  on  the  mechanisms  of  visual  selection,  as  well  as 
response  production.  The  mechanisms  of  visual  selection 
observed  in  FEF,  which  appear  to  reflect  processing 
throughout  the  visual  system,  can  be  used  to  design 
machine -vision  systems  that  operate  with  an  efficiency  and 
reliability  closer  to  that  of  the  primate  visuomotor  system. 
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